联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> javajava

日期:2020-12-10 10:16

Untitled 1

Purpose

The purpose of this assignment is to give you practise working with files, building and using

dictionaries, designing functions, and writing unit tests.

Introduction: The Twitterverse

Twitter is a social networking website where users can post very short messages know as

"tweets". Each Twitter user can choose to "follow" other users, which means that they see

those users' tweets. A Twitter user sees the tweets of users they are "following", and their

tweets are seen by their "followers" (the users who follow them).

All the "follow" connections define a network among Twitter users, and it's quite interesting to

look for patterns in the connections. Tools like Twiangulate let you explore questions like

"what connections do my two friends have in common?". In this assignment, you'll write a

program that lets you ask questions (or "queries") about a Twitter dataset.

Any tool for exploring the Twitterverse must get its data from Twitter itself. Twitter

provides an API to allow programmers to write programs that interact with Twitter and extract

data from it. In general, an API is a module that defines functions for accessing underlying

data and performing other tasks without having to know how that data is actually stored and

retrieved.

To make this assignment more manageable for you, we will assume that the information we

need has already been extracted from Twitter and stored in a file.

How To Tackle This Assignment

This is your first experience designing a program of this size. We are providing detailed

advice to help you break the task down into manageable pieces.

Make sure your twitterverse_functions.py module runs without error before submitting. If you

have syntax errors in your module, comment them out before submitting, or we will not be

able to test your functions! Submitted programs that have syntax errors, do not

execute/run, or infinitely loop are subject to a grade of 0.

The Twitter Data File

A Twitter data file contains a series of one or more user profiles, one after the other. Each user

profile has the following elements, in this order:

A line containing a non-blank, non-empty username. You may assume that usernames are

unique, that is, a single username will not occur more than once in the file, and that

usernames do not contain any whitespace.

A line for the user's actual name. If they did not provide a name, this line will be blank.

A line for the user's location, or a blank line if they did not provide one.

A line for the URL of a website, or a blank line if they did not provide one.

Untitled 2

Zero or more lines for the user's bio, then a line with nothing but the keyword ENDBIO on it.

This marks the end of the bio, and is not considered part of it. You may assume that no

bio has the string ENDBIO within it.) If the user did not provide a bio, the ENDBIO line will

come immediately after the website line, with no blank line in between.

Zero or more lines each containing the username of someone that this user is following,

then a line with the keyword END on it. You may assume that no one has END as their

username.) A user cannot be on his or her own following list. You may assume that every

user on a following list has a user profile in the Twitter data file.

Notice that the keywords act as separators in this file. All of their letters are capitalized, and

the keywords contain no punctuation.

Examples

Here is a sample user profile that might occur among many in a file:

tomCruise

Tom Cruise

Los Angeles, CA

http://www.tomcruise.com

Official TomCruise.com crew tweets. We love you guys!

Visit us at Facebook!

ENDBIO

katieH

NicoleKidman

END

The file data.txt is a smallish example of a complete Twitter data file (and was made by hand)

and the file rdata.txt is a much larger example (and is made from real data extracted from

Twitter). These should help you confirm your understanding of the file format and will also be

useful in testing your program.

Cycles in the data

Although a user cannot be on their own following or followers lists, there can be "loops" (we

call them "cycles") such as this: user A can be following B who is following A. This is the

shortest possible cycle. Of course, cycles can be longer.

The Query File

Note that the word "query" just means "question". In computer science, we use it to mean a

request for information. For this assignment, a query will be provided in a file. Below we will

review the high level parts of the query, look at an example, and then describe the format of

the query file.

Overview

A query has three components: a search specification, a filter specification, and a presentation

specification.

The search specification describes how to generate a list of Twitter usernames, starting with

an initial username (a list of length one) and then finding their followers or people they are

Untitled 3

following, then people that are those people's followers or who they are following, and so on.

When processing the search specification, don't try to do anything to avoid cycles. For

instance, if the search specification says to find the people who user A is following, and from

there the people they are following, you could find yourself back at user A. Don't try to avoid

that.

After processing the search specification, we have a list of Twitter usernames. Its length could

be zero. For example, if the initial username is 'adalovelace' and the search specification

contains a single 'followers' keyword, then the length of the list will be zero

if 'adalovelace' has no followers.

The filter specification describes how to filter the list of usernames produced by the search

specification. The filtering can be based on

whether or not they are following a particular user,

whether or not a particular user is their follower,

whether their name contains a particular string (case-insensitive), or

whether their location contains a particular string (case-insensitive).

After processing the filter specification, we have a possibly reduced list of usernames.

Once the search results have been found and filtered, the presentation

specification describes how the output should be presented. It specifies on what basis the

results should be sorted, and whether the results should be presented in a short or long

format.

Example query

Here is an example query:

SEARCH

tomCruise

following

following

following

FILTER

following KatieH

location-includes USA

PRESENT

sort-by popularity

format long

The search specification in this particular query has four steps.

 Start with a list containing the username to start the search from; ie. ['tomCruise'] . Let's

call that list L1.

 The search keyword 'following' says to replace each username p in L1 with the usernames

of the users who p is following. This yields a new list, L2.

 For the next 'following' keyword, we start with L2 and repeat the same operation as in

the previous step, yielding another list, L3.

Untitled 4

 For the final 'following' keyword, we start with L3 and repeat that operation one last time,

yielding list L4.

Notice that each step yields a list of zero or more usernames that is the input to the next step.

There should be no duplicates in the final results list. Duplicates should be removed after each

step.

The Twitter data file diagram_data.txt contains the follower/following relationships as

represented by this diagram. For those relationships, the search specification above would

yield this list of usernames: ['i', 'j', 'h', 'k', 'tomCruise'] . Make sure that you can see how

the four lists, ending with this final one, are generated. Notice that the final list contains the

users you can get to in three "steps" of the "following" relationship, starting from 'tomCruise' .

The final list generated by the search specification becomes the input to the filter

specification. For our current example, the filter specification says that the list should be

filtered in this way: a user should be kept only if they are following 'KatieH' and has a location

that includes the string 'USA' . The presentation specification says to present the results in

long format and to order the users according to their popularity.

Now let's look in detail at the exact format and meaning of a query.

Overall format of a query

The format of the query data file is as described below, in this order:

The keyword SEARCH alone on a line.

The search specification.

The keyword FILTER alone on a line.

The filter specification.

The keyword PRESENT alone on a line.

The presentation specification.

Notice that the keywords above all act as separators in this file and are in all capital letters.

There are other keywords (such as following which can be part of a search specification) that

are not separators; they are not capitalized.

Format and meaning of the search specification

The format of a search specification is as shown below, in this order:

A single username

Zero or more lines, in any order, each containing one of:

the keyword followers

the keyword following

Each step in the search specification involves modifying the current list of users by going

through each user in the current list and finding their followers or who they are following.

Importantly, at each step, we want to create the new results list by replacing each user from

the current list with the users that they are following or who are their followers.

Untitled 5

The possible search operations for a given user are:

followers operation: a list of all usernames for users who are following the given user (ie.

that have the given username in the value list paired with the key 'following' in their data

in the Twitterverse dictionary).

following operation: a list of all usernames for users who the given user is following (ie.

the value list paired with the key 'following' in the data associated with the given user in

the Twitterverse dictionary).

Note that, if the list of usernames is ['A'] , and we perform

a 'followers' or 'following' operation, 'A' will not be in the resulting list, since users cannot

follow themselves. If 'A' is one of multiple usernames in the current list, then 'A' can appear

in the results list, but only if one of the other usernames produces 'A' from the search

operation.

The final list of results should not contain any duplicates. It is most efficient to remove

duplicates after each operation is performed.

Format and meaning of the filter specification

The format of a filter specification is:

Zero or more lines, in any order, each containing one of the following:

the keyword name-includes , a space, and a string to match

the keyword location-includes , a space, and a string to match

the keyword follower , a space, and a string which is a valid username

the keyword following , a space, and a string which is a valid username

Each of the lines has exactly one blank within it (the space separating the keyword from the

next string), and each keyword will appear at most once.

The filters are "additive" in that each filter step should be applied to the list that resulted from

the previous filter step. That is, you start with the list from the search specification, filter it

based on the first filter step to get a new list L1, filter L1 in the second filter step to get L2, and

so on.

The 'following' and 'follower' filters will keep users in the list if they are following a

particular user, or if a particular user is their follower. These filters are defined as:

The 'following' filter keeps only users who have the provided username in

their 'following' list.

The 'follower' filter keeps only users who appear in the 'following' list for the provided

username.

The filters that have 'includes' as part of the keyword will do a simple substring search on

the string representing the relevant data for the user in the Twitterverse dictionary. This means

that if the given string occurs anywhere in a user's name (for a 'name-includes filter' ) or a

user's location (for a 'location-includes' filter), then that user is to be kept in the list. The

substring search should be case-insensitive. For example, if the filter specifies users whose

locations include the string "USA", then users with location "USA", "Modesto, California, USA",

Untitled 6

"kansas, usa" or "Musala, Bulgaria" would be kept, and users with location "United States of

America" would be excluded. This is far from perfect, but don't try to improve on it.

Format and meaning of the presentation specification

The output from your program must include every user whose username is still in the list after

the search and filter specifications have been processed.

The presentation specification describes how the output should be presented. It consists of

these two lines, in order:

The keyword 'sort-by' , then a space, then one of these

keywords: 'username' , 'name' , 'popularity'

Either 'format short' or 'format long' .

The users must be listed in the order indicated by the 'sort-by' portion of the presentation

specification. Output that is sorted by username or name should be in alphabetical order, so

that strings starting with 'a' are at the beginning of the output. Although usernames are

unique, names are not. If any users have the same name, sort them by username. When sorting

users by popularity, the most popular users should come first. A user's popularity is defined to

be the number of followers they have. Not the number of people they are following! If any

users are tied for popularity, sort them by username.

If the presentation specification is for short format output, the output should consist of the

usernames as a Python list of strings. If the presentation specification is for long format

output, the output for each user should consist of a series of lines about the user, as shown in

the example below. The long format output must include a line of ten dashes between each

pair of users, as well as a line of dashes at the very top and very bottom of the output.

If there are no users to be presented (i.e., the 2nd parameter in get_present_string represents

an empty list), then the long format output should just include two lines of dashes.

For the long format, the series of lines describing a single user should be formatted as shown

below:

tomCruise

name: Tom Cruise

location: Los Angeles, CA

website: http://www.tomcruise.com

bio:

Official TomCruise.com crew tweets. We love you guys!

Visit us at Facebook!

following: ['katieH', 'NicoleKidman']

Notice that:

The name, location, and website text is reproduced exactly as it was in the Twitter data

file, with the addition of the labels at the start of the line.

The bio begins on the line after the line containing 'bio:' , unlike the other components

that appear on the same line as their "tag".

The bio itself is also reproduced exactly as it was in the Twitter data file, including upper

and lower case, newline characters, etc.

Untitled 7

The "following" list is produced simply by converting the list of username strings to a

string.

Full Example

If the user were to enter the name of the Twitter data file data.txt and the query query3.txt,

the entire interaction with the Twitterverse program should be:

Data file: data.txt

Query file: query3.txt

----------

PerezHilton

name: Perez Hilton

location: Hollywood, California

website: http://www.PerezH...

bio:

Perez Hilton is the creator and writer of one of the most famous websites

in the world. And he also loves music - a lot!

following: ['tomCruise', 'katieH', 'NicoleKidman']

----------

tomCruise

name: Tom Cruise

location: Los Angeles, CA

website: http://www.tomcruise.com

bio:

Official TomCruise.com crew tweets. We love you guys!

Visit us at Facebook!

following: ['katieH', 'NicoleKidman']

----------

If the query were changed to request short-format output, the entire interaction with your

program should be:

Data file: data.txt

Query file: query3.txt

['PerezHilton', 'tomCruise']

Required Functions

In the starter code file twitterverse_functions.py , complete the following function definitions. In

addition, you must add some helper functions to aid with the implementation of these required

functions. You will find more details on the Twitterverse dictionary, and query dictionary

format in the next section, and in the starter code file.

Title

Function name:

Parameter types) → Return

type

Full Description (paraphrase to get a proper docstring

description)

Untitled

process_data: (TextIO) ->

Dict[str, Dict[str, Any]]

The parameter represents a Twitter data file that is already open

for reading. Read the file and return the data in the Twitterverse

dictionary format. Note: in the docstring, do not provide example

calls for functions that read files.

Untitled 8

Title

Function name:

Parameter types) → Return

type

Full Description (paraphrase to get a proper docstring

description)

Untitled

process_query: (TextIO) ->

Dict[str, Dict[str, Any]]

The parameter represents a query file that is already open for

reading. Read the file and return the query in the query dictionary

format. Note: in the docstring, do not provide example calls for

functions that read files.

Untitled

all_followers: (Dict[str,

Dict[str, Any]], str) ->

List[str]:

The first parameter represents the data in the Twitterverse

dictionary, and the second parameter represents a username.

Identify all the usernames that are following the user specified by

the second parameter and return them as a list. This is a helper

function for the sorting code below. You are also free to call it in

the code that you write too.

Untitled

get_search_results: (Dict[str,

Dict[str, Any]], Dict[str,

Any]) -> List[str]:

The first parameter represents the data in the Twitterverse

dictionary format, and the second parameter represents the

search specification in the search specification dictionary format.

Perform the specified search on the given Twitter data, and return

a list of strings representing usernames that match the search

criteria. The parameters should not be modified by the function.

Untitled

get_filter_results: (Dict[str,

Dict[str, Any]], List[str],

Dict[str, str]) -> List[str]

The first parameter represents the data in the Twitterverse

dictionary format, the second parameter represents a list of

usernames, and the third parameter represents the filter

specification in the filter specification dictionary format. Apply the

specified filters to the given username list to determine which

usernames to keep, and return the resulting list of usernames. The

parameters should not be modified by the function.

Untitled

get_present_string: (Dict[str,

Dict[str, Any]], List[str],

Dict[str, str]) -> str:

The first parameter represents the data in the Twitterverse

dictionary format, the second parameter represents a list of

usernames, and the third parameter represents the presentation

specification in the presentation specification dictionary format.

Format the results for presentation based on the given

presentation specification and return the formatted string. The

parameters should not be modified by the function.

Data Structures

To increase the readability of our code, we will use dictionaries to represent both the Twitter

data and the query in our program. In this section, we'll look at the format of the data and

query dictionaries.

The Twitterverse Data Dictionary

The type of the Twitterverse dictionary is Dict[str, Dict[str, object]] . More specifically, each

key in the Twitterverse dictionary is the username of a Twitter user, and the value associated

with that key is a dictionary containing additional information about the user. Each of these

inner dictionaries will contain the keys 'name' , 'location' , 'web' , 'bio' , and 'following' . The

value associated with the 'following' key is a list of zero or more strings, and the values

associated with the rest of the keys are strings (and may be the empty string).

For example, if the following user information is included in our Twitter data file...

Untitled 9

tomCruise

Tom Cruise

Los Angeles, CA

http://www.tomcruise.com

Official TomCruise.com crew tweets. We love you guys!

Visit us at Facebook!

ENDBIO

katieH

NicoleKidman

END

then the following key/value pair will be in our Twitterverse dictionary:

'tomCruise': {'name': 'Tom Cruise',

'bio': 'Official TomCruise.com crew tweets. We love you guys!\nVisit us at Facebook!',

'location': 'Los Angeles, CA',

'web': 'http://www.tomcruise.com',

'following': ['katieH', 'NicoleKidman']}

And another example: if the following user information is included in our Twitter data file (note

the blank lines for name, location, and web, and zero lines for bio and following):

quietTweeter

ENDBIO

END

then this key/value pair will be in our Twitterverse dictionary.

'quietTweeter': {'name': '',

'bio': '',

'location': '',

'web': '',

'following': []}

Note that dictionaries are not ordered, so don't worry if your dictionary output is in a different

order, as long as the items are the same.

The Query Dictionary

We will also use dictionaries to represent queries in our program. The query dictionary

contains three items; the key 'search' , whose value is a dictionary representing the search

specification, the key 'filter' , whose value is a dictionary representing the filter

specification, and the key 'present' , whose value is a dictionary representing the presentation

specification.

The search specification dictionary contains the keys 'username' and 'operations' , whose

values are the string representing the username to start at, and a list of strings representing

the operations to perform, respectively. Note that the list that is the value associated with the

key 'operations' may be empty, if there is only one line between

Untitled 10

the SEARCH and FILTER keywords in the query file. If there is only one line between SEARCH

and FILTER, then it represents the username start at, and there are no operations to be

performed in the search step. The operations must be performed in the order that they are

listed in the file. Note the change in results if the two operations are reversed in the Full

Example using query3.txt .)

The filter specification is a dictionary with strings representing filter keywords as keys, and

the strings to match as values. Note that this dictionary may be empty if there are no filters to

apply, and that each filter keyword will appear at most once in the filter section of a query file.

The presentation specification is a dictionary with keys 'sort-by' and 'format' , and string

values representing the order to sort the results, and the format to display them in,

respectively.

For example, if our query file looks like this:

SEARCH

tomCruise

following

following

following

FILTER

following KatieH

location-includes USA

PRESENT

sort-by popularity

format long

then the query dictionary will be this (again, the order of items may vary):

{'search': {'username': 'tomCruise', 'operations': ['following', 'following', 'following']},

'filter': {'following': 'KatieH', 'location-includes': 'USA'},

'present': {'sort-by': 'popularity', 'format': 'long'}}

Note that there are other ways that we could have chosen to represent the data and the query

in our program. However using dictionaries instead of, for example, lists, means that our code

contains string literal keys that are descriptive of what is being represented. For example,

using the format above, we will refer to the search specification in the query dictionary

using query['search'] , rather than query[0] , as we might if we were using a list or tuple of

query information.

The Main Program

Once you have correctly implemented the functions in twitterverse_functions.py , execution of

the main program ( twitterverse_program.py ) will:

 Read a data file and produce the Twitterverse data dictionary.

 Read a query file and produce the query dictionary.

 Compute the results of the search operations.

 Apply the filters to the search results.

Untitled 11

 Create the presentation string.

Required Testing ( unittest )

Write (and submit) a set of unittests for the get_filter_results function. Name this

file test_get_filter_results.py . For each test method, include a brief docstring description

specifying what is being tested. For unittest methods, we do not include a type contract and

the docstring description should not include example calls.

Files to Download

All of the files included in the download for the assignment are listed in this section. These

files must all be placed in the same folder. Download the files by right-clicking on the link and

using your web-browser's "Save as" option. Do not use "cut and paste" as sometimes this can

cause undesirable formatting characters to be added to our Python code.

Please download the Assignment 3 Files and extract the zip archive. The following

paragraphs explain the files you have been given.

Python Code

The starter code for this assignment is in the file twitterverse_functions.py . You need to

complete this file.

The main program in the file twitterverse_program.py . It is complete and must not be

changed.

The starter code for the required unittests is in the file test_get_filter_results.py . You

need to complete the file.

Sample Twitter Data and Sample Queries

We have provided a number of text files containing Twitter data, and queries on that

data. All files in the provided zip file that contain data in their filename are Twitter

data files, and all files in the provided zip file that contain query in their filename are

query files.

The query files query1.txt , query2.txt , and query3.txt are queries on the data

in data.txt , and the file query4.txt is a query on the data in rdata.txt .

MarkUs Tester

We are providing an automatic student tester via MarkUs that tests two things:

whether your code follows the Python style guidelines (note: this is an additional

resource!, and

whether your functions: have the correct parameter(s), are named correctly, and have the

correct return types.

The MarkUs Tester does not test the correctness of your functions, so you must do that

yourself.

Additional Requirements

Untitled 12

Do not add statements that call print , input , or open , or use an import statement.

Do not use any break or continue statements. We are imposing this restriction (and we

have not even taught you these statements) because, for a beginner programmer, they are

very easy to abuse, resulting in terrible code.

Do not modify or add to the import statements provided in the starter code.

Error checking

You may assume that the files actually contain a valid Twitter data file and a valid query file,

respectively. You do not need to check either file to make sure that it is a valid Twitter or

query file.

You may be wondering how you can be sure that the content of the Twitter data file is valid.

For example, there are probably rules about legal Twitter usernames and there are certainly

rules about valid URLs. Don't confirm that the username, URL, or any other elements of the

Twitter data file are valid. Simply read whatever string is in the relevant position in the file and

use it as is.

Reading the Twitter data file

Typically, the easiest way to read through a data file is with a for-loop, but because of the

structure of a Twitter data file, you will need to use while loops.

Watch out for how you handle \n (newline) characters. If you use readline to get a line of the

file, it will have a \n on the end (except possibly on the last line in the file). You need to be

aware of that before you try to compare it to keywords like END . On the other hand, there are

times when you really do need a \n . For instance, when you store the contents of a user's bio,

it must include any newline characters that appear in the middle of the bio, so that you are

able to display it in the same way when using the long format presentation.

Sorting your output

Part of processing the presentation specification requires you to sort the list of users

according to username, name, or popularity. Since you will be sorting a list (of string

usernames), you can use list.sort to accomplish the task. However, by default list.sort uses

simple < to compare pairs of list elements. We know that < on strings compares them

alphabetically. So list.sort will simply sort the strings into alphabetical order. In some cases,

that is not the ordering you want.

Fortunately, there are ways to get around this. In the starter code for twitterverse_functions.py ,

we have provided our own implementation of a sorting function that has a parameter that you

can use to control the ordering. All you have to do is send tweet_sort a function that takes

two parameters of the same type as the elements in your list (in this case, the two parameters

would be username strings) and returns:

1 if the first one should appear before the second one in a sorted list

1 if the first one should appear after the second one in a sorted list, and

0 if they are "tied", so that it doesn't matter which one comes first.

We have also provided some functions that return 1, 1, or 0 for each possible sorting case.

Untitled 13

The simple module sort_demo.py demonstrates how this can be done. It is a good example of

passing a function to a function. We strongly recommend you spend some time understanding

this code before you work on the sorting part of the presentation specification.

Other ways to work with functions

There will be other places where you can eliminate a great deal of duplication in your code by

passing a function as an argument to a function.

Functions can be used similarly to variables. In particular, we can use them as values in

dictionaries. You may find the example in the module dict_of_funcs_demo.py useful in

understanding how to call different functions in different cases, without having to write

lengthy if/elif/else blocks.


版权所有:留学生程序网 2020 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。